Table Structure Identification from Document Images: A Survey
نویسندگان
چکیده
Table structure identification has received significant research attention in the past few years. The OCR (Optical Character Recognition) has faced many potential errors in the document images, so it is strongly required to make logical objects such as table explicit. So to get the deep understanding about the contents of the document, proper understanding of the document is required by means of many algorithms. In this research paper, I emphasized to describe various methods or algorithms which segment the scanned images into different blocks and detect any tabular structure in any form that may be present in the document. INTRODUCTION A large number of pages are to be scanned and analyzed to create document image libraries targeted to real world applications. Creating a document image library involves a chain of thorough and intense activities like scanning, pre-processing, segmentation, layout analysis, storage and retrieval, etc. despite being the most researched field in the domain of Document Image Analysis (DIA), the problems are yet to be solved up to the desired level of accuracy and efficiency. There are lots of methods proposed by different persons for the identification of tabular structure from the document images along with some benefits as well as some limitations. To find the tabular structure from document image classification on the presence of any tabular structure in a page lead to better segmentation at a lower computing cost. The term “tabular structure” resembles with a table. There are large number of methods and algorithms proposed by different persons for the detection/segmentation of tabular structure. In this research paper we present a brief review of the past work under the Table category. Table: A table contains at least two rows and two columns, which may be fully or partially embedded in boxes formed by horizontal and vertical rule lines. Table detection and segmentation have been done in several ways at different times. The algorithms may be classified broadly into two types. These are as follows: 1. Based on the presence of rule lines in the table and 2. Based on the knowledge of table layout. Identification of Table: Our main topic of interest is identification of table which can be done by the following steps:1. Table Detection: Locating the regions of a document with a tabular content. 2. Table Structure Recognition: Reconstructing the cellular structure of a table. 3. Table Interpretation: Rediscovering the meaning of the tabular structure. This includes:International Journal of Innovations & Advancement in Computer Science
منابع مشابه
پژوهشی کیفی در تحلیل الگوی بهرهگیری خبرگان حوزهی سلامت از تصاویر پزشکی
Introduction: In health sector, image functions as a form of document that can convey a considerable amount of information. Employing this type of information can increase the effectiveness of the performance of medical experts. This study aimed to survey how health experts use medical images in their practice. Methods: This applied qualitative study was carried out in 1392 (2013). The study p...
متن کاملA Unified Algorithm for Identification of Various Tabular Structures from Document Images
This paper presents a unified algorithm for segmentation and identification of various tabular structures from document page images. Such tabular structures include conventional tables and displayed mathzones, as well as Table of
متن کاملIdentification of Item Fields in Table-form Documents with/without Line Segments
Many methods to recognize the layout structures of table-form documents have been proposed until today. Most of them interpret table-form document images using the knowledge which is adaptable to the specification of layout structures of individual table-form documents. H.Naruse et al. proposed a successful method, based on neighboring/connective relationships among item fields in table-form do...
متن کاملGeometric Structure Analysis of Document Images: A Knowledge-Based Approach
ÐGeometric structure analysis is a prerequisite to create electronic documents from logical components extracted from document images. This paper presents a knowledge-based method for sophisticated geometric structure analysis of technical journal pages. The proposed knowledge base encodes geometric characteristics that are not only common in technical journals but also publication-specific in ...
متن کاملAutomated Detection and Segmentation of Table of Contents Page from Document Images
With an aim to extract the structural information from the table of contents (TOC) to help develop digital document library the requirement of identifying/segmenting the TOC page is obvious. The objective to create digital document library is to provide a non-labour intensive, cheap and flexible way of storing, representing and managing the paper document in electronic form to facilitate indexi...
متن کامل